Bandit Linear Optimization for Sequential Decision Making and Extensive-Form Games

نویسندگان

چکیده

Tree-form sequential decision making (TFSDM) extends classical one-shot by modeling tree-form interactions between an agent and a potentially adversarial environment. It captures the online decision-making problems that each player faces in extensive-form game, as well Markov processes partially-observable where conditions on observed history. Over past decade, there has been considerable effort into designing optimization methods for TFSDM. Virtually all of work full-feedback setting, access to counterfactuals, is, information what would have happened had chosen different action at any node. Little is known about bandit assumption reversed (no counterfactual available), despite this latter setting being understood almost 20 years making. In paper, we give first algorithm linear problem TFSDM offers both (i) linear-time iterations (in size tree) (ii) O(sqrt(T)) cumulative regret expectation compared fixed strategy, times T. This made possible new results derive, which may independent uses well: 1) geometry dilated entropy regularizer, 2) autocorrelation matrix natural sampling scheme sequence-form strategies, 3) construction unbiased estimator losses 4) refined analysis mirror descent when using regularizer.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Decision Support for Extensive Form Negotiation Games

This paper presents a tool, NEGEXT, for finding individual and group strategies to achieve certain goals while playing extensive form negotiation games. NEGEXT is used as a model-checking tool which investigates the existence of strategies in negotiation situations. We consider sequential and parallel combinations of such games also. Thus, it may aid students of negotiation in their understandi...

متن کامل

Distributionally Robust Optimization for Sequential Decision Making

The distributionally robust Markov Decision Process approach has been proposed in the literature, where the goal is to seek a distributionally robust policy that achieves the maximal expected total reward under the most adversarial joint distribution of uncertain parameters. In this paper, we study distributionally robust MDP where ambiguity sets for uncertain parameters are of a format that ca...

متن کامل

A Unification of Extensive-Form Games and Markov Decision Processes

We describe a generalization of extensive-form games that greatly increases representational power while still allowing efficient computation in the zero-sum setting. A principal feature of our generalization is that it places arbitrary convex optimization problems at decision nodes, in place of the finite action sets typically considered. The possibly-infinite action sets mean we must “forget”...

متن کامل

Extensive-Form Argumentation Games

Two prevalent approaches to automated negotiation are the application of game-theoretic notions and the argumentation-based angle; these two schemes are frequently at odds. An elegant view of argumentation is Dung’s abstract argumentation theory [2], which cold-shoulders the internal structure of arguments in favor of the entire debate’s global structure. Dung’s theory is elaborated by work in ...

متن کامل

Extensive Form Games

I. Review of the Notation of Extensive Form Game Theory A. Let T be a finite set of nodes, thought of as states of the game where uncertainty is resolved, a choice is made, or, in the case of terminal nodes, where the game is over and payoffs are realized. 1. The nodes in T are partially ordered by a relation called precedence. 2. Formally the precedence relation is ≺ ⊂ T × T . 3. We assume tha...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i6.16677